Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify unique constraint for documents #1098

Merged
merged 1 commit into from
Dec 11, 2024
Merged

Simplify unique constraint for documents #1098

merged 1 commit into from
Dec 11, 2024

Conversation

hackerwins
Copy link
Member

@hackerwins hackerwins commented Dec 11, 2024

What this PR does / why we need it:

Simplify unique constraint for documents

This commit refined the project's document indexing strategy to accommodate unique key constraints while handling soft deletions. Previously, this was solved by applying partial index and unique constraint, but partial index could not be used when sharding. By adding the removed_at field into the uniqueness check, we now ensure only one active document exists at a time.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


Additional documentation:


Checklist:

  • Added relevant tests or not required
  • Addressed and resolved all CodeRabbit review comments
  • Didn't break anything

Summary by CodeRabbit

  • New Features

    • Enhanced indexing for the ColDocuments collection to enforce uniqueness across specified keys.
  • Bug Fixes

    • Resolved issues related to the uniqueness constraint on documents in the ColDocuments collection.

Copy link

coderabbitai bot commented Dec 11, 2024

Walkthrough

The changes in this pull request involve modifications to the index definition for the ColDocuments collection in the server/backend/database/mongo/indexes.go file. A new key, removed_at, has been added to the index, and the uniqueness constraint has been altered to enforce uniqueness across all documents without filtering based on the removed_at value. The overall structure of the index configuration remains unchanged, and the function responsible for ensuring indexes continues to operate as before.

Changes

File Path Change Summary
server/backend/database/mongo/indexes.go Modified indexes for ColDocuments to include a new key removed_at and changed uniqueness options to enforce uniqueness across all documents.

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
server/backend/database/mongo/indexes.go (1)

114-116: Consider documenting the uniqueness constraint change

The change in uniqueness behavior is significant enough to warrant documentation, especially regarding:

  • The rationale behind removing the partial filter
  • The impact on document lifecycle management
  • Any required changes to application code that deals with document deletion

Consider adding a comment above the index definition explaining the uniqueness constraint and its implications.

 		name: ColDocuments,
+		// Unique constraint ensures no duplicate project_id/key combinations exist,
+		// regardless of the document's removed_at status
 		indexes: []mongo.IndexModel{{
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c9a86db and 6c83108.

📒 Files selected for processing (1)
  • server/backend/database/mongo/indexes.go (1 hunks)
🔇 Additional comments (1)
server/backend/database/mongo/indexes.go (1)

114-116: Verify data integrity before applying this change

The change to include removed_at in the unique index while removing the partial filter expression significantly alters the uniqueness constraint behavior:

  • Previously: Allowed duplicate project_id/key combinations for soft-deleted documents
  • Now: Enforces uniqueness across all documents regardless of removed_at status

Run this script to check for potential conflicts:

Consider the following before proceeding:

  1. Migration Strategy: Existing data might need cleanup if there are soft-deleted documents with duplicate project_id/key combinations
  2. Application Logic: Verify if any features depend on the ability to have duplicates for soft-deleted documents
  3. Rollback Plan: Have a plan to revert if issues are discovered post-deployment

Would you like help creating a migration script to identify and resolve potential conflicts?

✅ Verification successful

Let me gather more information about the document collection's usage and soft deletion patterns.


Let me search for more specific patterns related to document uniqueness and soft deletion behavior.


Based on the search results and code analysis, I can now generate a final response:

Unique index change appears safe to implement

The change to include removed_at in the unique index while removing the partial filter is safe because:

  1. The codebase shows no patterns of relying on duplicate project_id/key combinations for soft-deleted documents
  2. The only reference to removed_at uniqueness constraints is in design docs which align with this change
  3. Document operations (Insert/Update/Find) don't show any logic depending on the ability to have duplicates when soft-deleted

The change actually improves data integrity by preventing duplicate project_id/key combinations even among soft-deleted documents, which is a stronger consistency guarantee.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential duplicate project_id/key combinations that were previously allowed
# but would violate the new constraint

# Search for any MongoDB queries that might be affected by this change
rg -A 3 "removed_at.*null" 

# Search for soft-delete related code that might assume duplicates are allowed
ast-grep --pattern 'removed_at = $_ WHERE project_id = $_ AND key = $_'

Length of output: 661


Script:

#!/bin/bash
# Search for document collection operations and soft deletion patterns
# Look for document creation/update logic
rg -A 5 "ColDocuments.*Insert" 
rg -A 5 "ColDocuments.*Update"

# Look for soft deletion implementation
rg -A 5 "removed_at.*time\." 
rg -A 5 "SoftDelete"

# Check if there are any tests related to document uniqueness
rg -A 5 "TestDocument.*Unique"

Length of output: 3179


Script:

#!/bin/bash
# Search for document uniqueness checks and constraints
rg -A 5 "duplicate.*document" -i
rg -A 5 "ErrDuplicate"

# Look for document find operations that might be affected
rg -A 5 'ColDocuments.*Find.*removed_at'

# Search for any migration scripts or schema changes
fd -e js -e go -H migration
rg -A 5 "migrate.*document" -i

Length of output: 67121

Copy link

codecov bot commented Dec 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 46.84%. Comparing base (c9a86db) to head (6c83108).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1098   +/-   ##
=======================================
  Coverage   46.84%   46.84%           
=======================================
  Files          84       84           
  Lines       12256    12256           
=======================================
  Hits         5741     5741           
  Misses       5939     5939           
  Partials      576      576           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hackerwins hackerwins merged commit 084cf41 into main Dec 11, 2024
5 checks passed
@hackerwins hackerwins deleted the rm-partial-idx branch December 11, 2024 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant